Learning optimally diverse rankings over large document collections
نویسندگان
چکیده
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-torank formulation that optimizes the fraction of satisfied users, with a scalable algorithm that explicitly takes document similarity and ranking context into account. We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.
منابع مشابه
Ranked bandits in metric spaces: learning diverse rankings over large document collections
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable a...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملAn Active Learning Approach to Efficiently Ranking Retrieval Engines
Evaluating retrieval systems, such as those submitted to the annual TREC competition, usually requires a large number of documents to be read and judged for relevance to query topics. Test collections are far too big to be exhaustively judged, so only a subset of documents is selected to form the judgment “pool.” The selection method that TREC uses produces pools that are still quite large. Res...
متن کاملLarge Scale Biomedical Concept Mapping
In this paper, we report on the application of simple retrieval strategies to biomedical concept mapping. We aim at evaluating the performance of a learning-free system tailored to map large collections of concepts, as they can be found in health sciences. Our system is seen as a solution in those cases where machine learning approaches cannot be applied for scalability or data unavailability r...
متن کاملA Model for Multimodal Information Retrieval
Finding useful information from large multimodal document collections such as the WWW without encountering numerous false positives poses a challenge to multimodal information retrieval systems (MMIR). A general model for multimodal information retrieval is proposed by which a user’s information need is expressed through composite, multimodal queries, and the most appropriate weighted combinati...
متن کامل